Skip to content

BUG: .describe() doesn't work for EAs #61707 #61760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

kernelism
Copy link

This PR fixes a bug where Series.describe() fails on certain ExtensionArray dtypes such as pint[kg], due to attempting to cast the result to Float64Dtype. This is because some of the produced statistics are not castable to float, which raises errors like DimensionalityError.

We now avoid forcing a Float64Dtype return dtype when the EA’s scalar values cannot be safely cast. Instead:

If the EA produces outputs with mixed dtypes, the result is returned with dtype=None.

kernelism and others added 30 commits July 2, 2025 17:02
* DEPR: object inference in to_stata

* Whatsnew

* Fix broken test

* alphabetize
…as-dev#61767)

Revert "ENH: Allow third-party packages to register IO engines (pandas-dev#61642)"

This reverts commit 9dcce63.
* CLN: remove and udpate for outdated _item_cache

* CLN: remove outdated _item_cache in comment

* CLN: rollback unittest unralted to _item_cache
* PERF: avoid object-dtype path in ArrowEA._explode

* typo fixup
pandas-dev#61773)

* BUG: Decimal(NaN) incorrectly allowed in ArrowEA constructor with timestamp type

* GH ref

* BUG: ArrowEA constructor with timestamp type

* mypy fixup

* mypy fixup
…1785)

* REF: remove unreachable, stronger typing in parsers.pyx

* mypy fixup
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/astral-sh/ruff-pre-commit: v0.11.12 → v0.12.2](astral-sh/ruff-pre-commit@v0.11.12...v0.12.2)
- [github.com/MarcoGorelli/cython-lint: v0.16.6 → v0.16.7](MarcoGorelli/cython-lint@v0.16.6...v0.16.7)
- [github.com/pre-commit/mirrors-clang-format: v20.1.5 → v20.1.7](pre-commit/mirrors-clang-format@v20.1.5...v20.1.7)
- [github.com/trim21/pre-commit-mirror-meson: v1.8.1 → v1.8.2](trim21/pre-commit-mirror-meson@v1.8.1...v1.8.2)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Rename method

* ignore PLW0177

* Noqa test

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Matthew Roeschke <[email protected]>
* Bump numpy

* Bump numpy

* Bump tzdata

* ignore pytables usage, update xfail condition
…_csv (pandas-dev#61650)

* feature pandas-dev#49580: support new-style float_format string in to_csv

feat(to_csv): support new-style float_format strings using str.format

Detect and process new-style format strings (e.g., "{:,.2f}") in the
float_format parameter of to_csv.

- Check if float_format is a string and matches new-style pattern
- Convert it to a callable (e.g., lambda x: float_format.format(x))
- Ensure compatibility with NaN values and mixed data types
- Improves formatting output for floats when exporting to CSV

Example:
df = pd.DataFrame([1234.56789, 9876.54321])
df.to_csv(float_format="{:,.2f}")  # now outputs formatted values like
1,234.57

Co-authored-by: Pedro Santos <[email protected]>

* update benchmark test

* fixed pre commit

* fixed offsets.pyx

* fixed tests to windows

* Update pandas/io/formats/format.py

Co-authored-by: Matthew Roeschke <[email protected]>

* Update pandas/io/formats/format.py

Co-authored-by: Matthew Roeschke <[email protected]>

* Update pandas/io/formats/format.py

Co-authored-by: Matthew Roeschke <[email protected]>

* updated v3.0.0.rst and fixed tm.assert_produces_warning

* fixed test_new_style_with_mixed_types_in_column added match to assert_produces_warning

* Update doc/source/whatsnew/v3.0.0.rst (removed reference to this PR)

Co-authored-by: Simon Hawkins <[email protected]>

* fixed pre-commit

* removed tm.assert_produces_warning

* fixed space

* fixed pre-commit

---------

Co-authored-by: Pedro Santos <[email protected]>
Co-authored-by: Matthew Roeschke <[email protected]>
Co-authored-by: Simon Hawkins <[email protected]>
…andas-dev#61727)

* TST: update expecteds for using_string_dtype to fix xfails

* Update to_dict_of_blocks test to hardcode object dtype

* Comment

* Split test, update expected, targeted xfails

* Update json test

* revert commented-out
* DOC: Update link to pytz documentation

* Update the pytz link per the suggestion
heoh and others added 20 commits July 11, 2025 15:08
…ment (pandas-dev#61827)

* DOC: Correct error message in AbstractMethodError for methodtype argument

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
fix(doc): rm excessive backtick
…e' and 'Docs' properly (pandas-dev#61836)

* DOC: Update README.md to proper link to issues related to Docs

* DOC: Update README.md to proper link to issues related to 'good first issue'
def test_describe_multiple_dtypes(self):
"""
GH61707: describe() doesn't work on EAs which generate
statistics with multiple dtypes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can this be a comment instead of a docstring

@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
return names


def has_multiple_internal_dtypes(d: list[Any]) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can be inlined since it is only used once

@@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
import pyarrow as pa

dtype = ArrowDtype(pa.float64())
elif has_multiple_internal_dtypes(d):
# GH61707: describe() doesn't work on EAs
# with multiple internal dtypes, so return object dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the relevant characteristic "multiple internal dtypes" or "entries that cant be cast to Float64"?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

latter makes more sense

@kernelism kernelism closed this Jul 20, 2025
@kernelism kernelism deleted the describe-EA-multiple-dtypes-fix branch July 20, 2025 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: .describe() doesn't work for EAs